Recognition device and method

ABSTRACT

A recognition device which judges whether a target is identical with a predetermined reference, comprising: a holding unit which holds the target; multiple deformation units which deform the target held by the holding unit with at least one degree of flexibility in deformation; multiple deformed amount estimation units which correspond to the deformation units in a one-to-one relationship and estimate a deformed amount of the target from the reference with respect to the flexibility in deformation according to the corresponding deformation unit; an estimated error evaluation unit which evaluates an estimated error of the deformed amount estimated by the deformed amount estimation unit; an adjustment unit which operates any of the deformation units with precedence according to the estimated error evaluated by the estimated error evaluation unit; a similarity calculation unit which calculates a similarity between the reference and the target which is deformed by the deformation unit operated with precedence by the adjustment unit; and a judgment unit which judges whether the target is identical with the reference according to the similarity calculated by the similarity calculation unit.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention relates to a recognition device and method,and more particularly a recognition device and method which performs acomputation on a target such as an image or voice to recognize whetherit matches a predetermined reference.

[0003] 2. Description of the Related Art

[0004] For recognition of a recognition target such as a substance orthe face of a person with reference to its image, it can be recognizedin principle by calculating a similarity of the input image of asubstance with a template image of a previously stored reference.

[0005] But, the image of an actual recognition target is largelyvariable depending on environmental conditions such as a direction ofthe placed recognition target, a distance and lighting. Therefore, anenormous quantity of templates corresponding to image variations must beprepared. And, a computational quantity required for calculation of thesimilarity between the input image and the templates also becomesenormous.

[0006] Therefore, a method, which normalizes the input image to aposition, inclination, size and the like predetermined by geometricaltransformation or the like, is effective. Normalization allows thereduction of template images to be compared and the recognitionprocessing in actual computing time.

[0007] As a normalization method, there is a known method which extractsfeature points from the input image and applies the extracted featurepoints to a shape model of a prescribed normalization image so tonormalize the input image. As a typical feature point extraction method,a method using an edge operator is known, but a clear edge may not beobtained when a substance has a smooth surface shape such as a face, andan edge is greatly susceptible to lighting conditions.

[0008] Meanwhile, a scholarly treatise “Rotation Invariant NeuralNetwork-Based Face Detection” (H. A. Rowley, S. Baluja and T. Kanade,Proceedings of IEEE Conference on Computer Vision and PatternRecognition, 1998, pp. 38-44) discloses a technique to detect adeviation from a normalized image directly from a light and dark patternof an input image and uses the detected value to normalize the inputimage. According to the treatise, a tilted angle of the face is detectedfrom a tilted face image by a neural net, the tilted angle detected isused to make the face image upright, and it is recognized whether theinput image is a face image. This method can estimate an angle withresistant to a change in the input image by virtue of generalizationability of the neural net, and a normalized image can be obtainedstably.

[0009] But, the technique described in the above treatise needs toaccurately estimate the lean angle with respect to all angles.Therefore, it is necessary to learn all angles to a learning sample, andthere are disadvantages that it is necessary to prepare a lot oflearning samples and learning takes a long time.

[0010] Besides, the above treatise covers rotation in only an imagesurface as flexibility of deformation of an input image, but an actualimage has a high degree of flexibility in a rotation in a depthdirection, a size, a position, lighting and the like, resulting in amore serious problem.

[0011] In other words, learning samples varied independent of respectiveflexibility are required in order to accurately estimate a lot offlexibility at the same time, and an enormous quantity of learningsamples is required as the product of the number of samples required forrespective flexibility. Accordingly, it is impossible to completelearning in a realistic time.

[0012] Under the circumstances described above, the present inventionprovides a recognition device and method which can normalize the targetsuch as an image or voice even if the target varies largely and canlearn with ease.

SUMMARY OF THE INVENTION

[0013] The present invention has been made in view of the abovecircumstances, and an aspect of the present invention is a recognitiondevice which judges whether a target is identical with a predeterminedreference, comprising: a holding unit which holds the target; multipledeformation units which deform the target held by the holding unit withat least one degree of flexibility in deformation; multiple deformedamount estimation units which correspond to the deformation units in aone-to-one relationship and estimate a deformed amount of the targetfrom the reference with respect to the flexibility in deformationaccording to the corresponding deformation unit; an estimated errorevaluation unit which evaluates an estimated error of the deformedamount estimated by the deformed amount estimation unit; an adjustmentunit which operates any of the deformation units with precedenceaccording to the estimated error evaluated by the estimated errorevaluation unit; a similarity calculation unit which calculates asimilarity between the reference and the target which is deformed by thedeformation unit operated with precedence by the adjustment unit; and ajudgment unit which judges whether the target is identical with thereference according to the similarity calculated by the similaritycalculation unit.

[0014] Another aspect of the present invention is a recognition methodfor determining whether a target is identical with a predeterminedreference, comprising: estimating a deformed amount of the target fromthe reference according to multiple degrees of freedom for deformation;evaluating an estimated error of the estimated deformed amount;deforming the target with the degree of freedom for deformation havingthe evaluated estimated error at a minimum level; calculating asimilarity between the deformed target and the reference; and judgingwhether the target is identical with the reference according to thecalculated similarity.

[0015] According to the present invention, it is configured that thedeformed amount of the target from the reference is estimated accordingto the multiple deformation flexibility, the estimated error of theestimated deformed amount is evaluated at the same time, the target isdeformed with deformation flexibility having the minimum evaluatedestimated error, a similarity between the deformed target and thereference is calculated, and it is judged whether the target isidentical with the reference according to the calculated similarity.Therefore, even when the environment of the target changes largely, theimage recognition device which can recognize the target can be realizedwith realistic resources, and learning can be made easily with lesslearning samples.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016]FIG. 1 is a block diagram showing a functional structure of arecognition device to which the present invention is applied;

[0017]FIG. 2 is a diagram showing a structure of an image-deformedamount estimation section 5;

[0018]FIG. 3 is a diagram illustrating an auto-encoder;

[0019]FIG. 4 is a diagram illustrating a nonlinear subspace;

[0020]FIG. 5 is a diagram showing a relationship between a projectionvalue to a subspace and an image rotation angle;

[0021]FIG. 6A and FIG. 6B are diagrams showing distributions of learningsamples;

[0022]FIG. 7 is a diagram showing a comparison of the number of learningsamples of Embodiment 1 and prior art;

[0023]FIG. 8 is a flow chart showing a flow of an operation of arecognition device when it is learning;

[0024]FIG. 9 is a flow chart (1) showing a flow of an operation of therecognition device when it is recognizing;

[0025]FIG. 10 is a flow chart (2) showing a flow of an operation of therecognition device when it is recognizing;

[0026]FIG. 11 is a diagram showing a state of changes of an image storedin an image holding section 2;

[0027]FIG. 12 is a graph showing the changes of image angles anddeviations from the center when eight samples are recognized;

[0028]FIG. 13 shows diagrams of relationships between particularprojection values and rotation angles of images in Embodiment 2; and

[0029]FIG. 14 shows diagrams of a method of changing input voices inEmbodiment 3.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0030] Preferred embodiments of a recognition device and methodaccording to the present invention will be described in detail withreference to the accompanying drawings.

[0031] [Embodiment 1]

[0032]FIG. 1 is a block diagram showing a functional structure of arecognition device to which the present invention is applied.

[0033] As shown in FIG. 1, the recognition device comprises an imageinput section 1, an image holding section 2, a similarity calculationsection 3, a judgment section 4, multiple image-deformed amountestimation sections 5 (5-1 to 5-n), multiple image deformation sections6 (6-1 to 6-n) and an image deformation adjustment section 7.

[0034] The image input section 1 inputs an image to be recognized andcomprises, for example, a CCD camera.

[0035] The image holding section 2 holds the target image to berecognized input by the image input section 1. The image is held as avector pattern which, for example, has luminance of each pixel of theimage as a component and a dimension of the number of pixels. The heldimage is deformed appropriately by the image deformation section 6, andthe deformed image is held again by the image holding section 2.

[0036] The similarity calculation section 3 calculates a similaritybetween the image held by the image holding section 2 and the templateimage of a reference designated in advance. The similarity can beindicated by a Euclid distance between an image held by the imageholding section 2 and the template image, namely the square root of asum of squares of a difference in luminance of corresponding pixels, adistance between an image held by the image holding section 2 and asubspace approximating the distribution of the multiple template images,and the like. For example, when it is assumed that inverse numbers ofsuch distances indicate a similarity, the similarly becomes higher assuch distances become smaller. And, the similarity calculation section 3can also serve as an estimated error evaluation section 14 to bedescribed later.

[0037] The judgment section 4 compares the similarity calculated by thesimilarity calculation section 3 with a predetermined threshold anddetermines that the target is identical with the reference when thesimilarity is larger than the threshold.

[0038] The image-deformed amount estimation sections 5 (5-1 to 5-n)respectively estimate a level of deformation of the image held by theimage holding section 2 from the template image with respect to theflexibility of a different image deformation. The flexibility of imagedeformation includes, for example, positional displacement, rotation,enlargement/reduction and the like. The image-deformed amount estimationsection 5 also evaluates precision of the estimated image deformationand outputs it.

[0039] The image deformation sections 6 (6-1 to 6-n) each correspond tothe image-deformed amount estimation sections 5 (5-1 to 5-n) in aone-to-one relationship and deform the image held by the image holdingsection 2 to resemble the template image. Flexibility of the imagedeformation sections 6 for image deformation manipulation is the same asthe image deformation flexibility of the corresponding image-deformedamount estimation sections 5, and a degree of deformation manipulationis determined according to an estimate of image-deformed amount and itsprecision from the corresponding image-deformed amount estimationsections 5. The degree of deformation manipulation is determined to be avalue obtained by multiplying the estimate of image-deformed amount by adeceleration factor. It is configured that the deceleration factor is 1when the estimate error is 0 and approaches 0 as the estimate errorbecomes larger. A direction of image deformation manipulation isdetermined to be a direction in that the image held by the image holdingsection 2 approaches the template. In other words, the smaller theestimated error is, the image held by the image holding section 2approaches the template image by a less image operation.

[0040] The image deformation adjustment section 7 adjusts the imagemanipulation by each of the image deformation sections 6. The adjustmentis made according to the estimate error of the estimate ofimage-deformed amount being sent from each of the image-deformed amountestimation sections 5, and the image deformation section 6 with thesmallest value is operated with higher priority. But, the imagedeformation section 6 having the image deformation estimate of aprescribed value or below is not operated.

[0041] Then, a structure of the image-deformed amount estimation section5 will be described in detail. FIG. 2 is a diagram showing the structureof the image-deformed amount estimation section 5. As shown in FIG. 2,the each image-deformed amount estimation section 5 comprises a featurepattern generation section 11, a subspace projection calculation section12, a deformed amount estimation section 13, and an estimated errorevaluation section 14.

[0042] The feature pattern generation section 11 generates a featurepattern by predetermined transformation from the vector patternindicating the image input from the image holding section 2. Thisconversion method can be, for example, a method which projects a vectorpattern to finite base vectors having a large characteristic valueobtained by previously analyzing main components of desired multipleimages, a method which performs Fourier transformation of a vectorpattern, a method which performs wavelet transformation of a vectorpattern, or the like.

[0043] The subspace projection calculation section 12 previously learnsa subspace which approximates the distribution of a feature pattern ofthe reference image and, when the feature pattern of the target image isinput, the feature pattern is projected to the subspace, and coordinatevalues of a coordinate system on the subspace of the projection areoutput. Here, the projection vector of the feature pattern to thesubspace is determined to be a point on the subspace which is nearest tothe feature pattern. Because a change of the feature pattern is verycomplex in a real world, a nonlinear subspace is preferable forapproximation of the change.

[0044] The nonlinear subspace can be indicated by, for example, a neuralnet which is called an auto-encoder as shown in FIG. 3. The auto-encoderis a kind of multilayer perceptron, the number of neurons n of an inputlayer 21 is equal to the number of neurons n of an output layer 22 asshown in FIG. 3, and the number of neurons of an intermediate layer 23is smaller than the number of neurons of the input layer 21. And, thesame value as that input to the input layer 21 is given as a teachersignal to the neuron of the output layer 22, and a weight of synapticconnection is learned so to realize identical mapping. Learning can bemade by an ordinary back-propagation method.

[0045] As shown in FIG. 4, output of the output layer 22 of the learnedauto-encoder configures a nonlinear subspace 32 which approximates thedistribution of a learning sample 31 within an n-dimensional spaceindicating input, and output of neurons of the intermediate layer 23 ofthe auto-encoder corresponds to coordinate components of a coordinatesystem 34 on the nonlinear subspace 32. Therefore, output of the outputlayer 22 when a feature pattern 33 is input to the learned auto-encoderbecomes projection 35 to the nonlinear subspace 32 of the featurepattern 33, and output of neurons of the intermediate layer 23 has itsprojection vector shown by the coordinate system on the nonlinearsubspace 32.

[0046] The deformed amount estimation section 13 previously learns arelationship between the projection of a feature pattern, which isoutput from the subspace projection calculation section 12, to subspaceand an image-deformed amount, uses it to determine an image-deformedamount from the projection value to the subspace and outputs it. Therelationship between the projection to the subspace and theimage-deformed amount becomes a relationship as shown in FIG. 5 when,for example, the subspace is indicated by the auto-encoder having thenumber of neurons 2 of the intermediate layer 23 and the flexibility isrotation.

[0047]FIG. 5 shows that output of two neurons of the intermediate layer23 is plotted two-dimensionally, and its locus draws a closed curveaccording to a rotation angle of the image. Because the points on theclosed curve and the rotation angles of the image are corresponded oneby one, the rotation angles of the image can be estimated from theoutput of two neurons of the intermediate layer 23. The relationshipbetween the output of neurons of the intermediate layer 23 andimage-deformed amount is stored as an approximate function in thedeformed amount estimation section 13 or stored as a lookup table in amemory (not shown).

[0048] The estimated error evaluation section 14 calculates a distance dbetween the feature pattern 33 and the nonlinear subspace 32 as shown inFIG. 4 and outputs the distance d as precision of an image-deformedamount estimated by the deformed amount estimation section 13. Here, thedistance between the feature pattern 33 and the nonlinear subspace 32 isa Euclidean distance between the feature pattern 33 and the projectionvector 35 onto its nonlinear subspace 32. For example, when thenonlinear subspace 32 is indicated by the auto-encoder, it can beindicated by a Euclidean distance between the input and output of theauto-encoder. The feature vector 33 can be regarded as approximated wellby the nonlinear subspace 32 as the distance between the feature vector33 and the nonlinear subspace 32 is smaller, so that it is appropriateto use d as precision of the image-deformed amount. It is to beunderstood that any monotone increasing function related to d can beused.

[0049] A learning sample to make the recognition device shown in FIG. 1to learn will be described. It is to be understood that a recognitiontarget is changed its inclination and position within a field of view ofthe observer but its distance from the observer is not changed to makeit simple. Specifically, flexibility of deformation of an image of therecognition target is allowed to rotate in an image surface and to movein the image surface only for example. It is naturally possible toprovide another flexibility.

[0050] Two image-deformed amount estimation sections 5 are provided tocomply with flexibility of deformation of the image, a firstimage-deformed amount estimation section 5-1 serves to rotate within theimage surface and a second image-deformed amount estimation section 5-2serves to move within the image surface. Learning samples of the firstimage-deformed amount estimation section 5-1 are multiple erect imagesof a target positioned at the center of an observer's view and rotatedand shifted at the same time. The rotation angle is variable with randomnumbers ranging, for example, from −180 degrees to 180 degrees, as shownin FIG. 6A and FIG. 6B, and the degree of shift is changed with randomnumbers in Gaussian distribution of, for example, a width of 6 pixels invertical and horizontal directions as shown in FIG. 6A and FIG. 6B. Thenumber of samples is, for example, 1000 images prepared.

[0051] Similarly, for the second image-deformed amount estimationsection 5-2, multiple erect images of the target positioned at thecenter of the observer's view are prepared for the learning samples, themultiple images are rotated and shifted at the same time, and therotation angle is varied with random numbers in Gaussian distributionof, for example, a width of 10 degrees as shown in FIG. 6A and FIG. 6B.And, the degree of shift is varied with uniform random numbers of, forexample, from −6 pixels to 6 pixels in vertical and horizontaldirections as shown in FIG. 6A and FIG. 6B. The number of samples is,for example, 1000 images prepared.

[0052] As described above, according to the present invention,substantially the same number of learning samples may be prepared forthe flexibility of image deformation to be recognized, and the number oflearning samples is proportional to the flexibility of image deformationas shown in FIG. 7.

[0053] Meanwhile, because prior art needs to prepare a learning samplefor all combinations of flexibility of image deformation to berecognized, the number of learning samples increases in series withrespect to the flexibility of image deformation. Therefore, when theflexibility of image deformation increases particularly, the number oflearning samples is less than in the prior art, and learning timebecomes short according to the invention.

[0054] And, a resource to express a subspace for approximation of thedistribution of the learning samples increases as the number of learningsamples increases. For example, it is the number of neurons or synapsesof the intermediate layer for the auto-encoder. Accordingly, calculationtime for recognition also increases. Therefore, the present invention isalso effective to reduce the resource and to decrease the recognitiontime.

[0055] Then, an operation of the recognition device shown in FIG. 1 whenit learns will be described. FIG. 8 is a flow chart showing a flow of anoperation when the recognition device learns.

[0056] Learning can be made independent of the first image-deformedamount estimation section 5-1 and the second image-deformed amountestimation section 5-2. Here, learning by the first image-deformedamount estimation section 5-1 will be described. Learning by the secondimage-deformed amount estimation section 5-2 is also made by the sameprocedure.

[0057] First, the auto-encoder of the first image-deformed amountestimation section 5-1 is initialized (step 101). Initialization setsthe number of neurons of the input layer and the output layer to thenumber of dimensions of a feature pattern, the number of neurons of thethird layer is set to 2 which indicates the flexibility of the imagerotation. The number of neurons of the second layer and the fourth layeris set to the number of dimensions or more of a feature pattern.Besides, a weight of each synapse is initialized by random numbers.

[0058] Then, for example, a learning sample of 27×27 pixels is input tothe image input section 1 and held in the form of a 729-dimensionalvector, which has a luminance value of each pixel as a component, by theimage holding section 2 (step 102). Subsequently, the image held by theimage holding section 2 is sent to the feature pattern generationsection 11-1, projected to, for example, a prescribed 50-dimensionallinear subspace, and converted into a 50-dimensional feature pattern(step 103). Here, the predetermined linear subspace is a subspace whichhas as a base top 50 characteristic vectors obtained by analyzing maincomponents of, for example, an image of multiple given 27×27 pixels, andthe projection to the subspace has an effect to compress an amount ofinformation while substantially keeping the vector size.

[0059] Then, the feature pattern is input to the auto-encoder and alsogiven as a teacher signal of the auto-encoder at the same time. And, aweight of each synapse is updated by a conventional technique, theback-propagation method to decrease a square error of output of theauto-encoder and the teacher signal so to learn the subspace (step 104).Details of the back-propagation method will not be described in detailhere because it is known well. And, the square error of the output ofthe auto-encoder and the teacher signal is averaged over all learningsamples, and learning is continued until its value becomes smaller thanthe prescribed value (NO in step 105). When the square error becomessmaller than the prescribed value, namely when learning of the subspaceconverges (YES in step 105), learning of the deformed amount estimationsection 13-1 is performed.

[0060] In learning of the deformed amount estimation section 13-1, alearning sample is input again (step 106), the learning sample isconverted into a feature pattern (step 107), and the feature pattern isinput to the auto-encoder to calculate a projection value (step 108).And, output of two neurons of the intermediate layer of the auto-encoderis input to the deformed amount estimation section 13-1, and an angle ofrotation applied to generate the learning sample is also input (step109). As learning of the deformed amount estimation section 13-1, alookup table of rotation angles to the output of two neurons of theintermediate layer 2 is prepared (step 110). The above processing isperformed on all learning samples (NO in step 111), and when theprocessing of all learning samples is completed (YES in step 111), thelearning is terminated.

[0061] Here, learning of the respective image-deformed amount estimationsections 5 was independently described but the respective image-deformedamount estimation sections 5 can be related to one another for learning.As a learning method, a learning sample is input to, for example, allthe image-deformed amount estimation section 5, and the image-deformedamount estimation section 5 which has obtained the best result (minimumdistance) is made to learn, and the same procedure is repeated.

[0062] Then, an operation for recognition by the recognition deviceshown in FIG. 1 will be described. FIG. 9 and FIG. 10 are flow chartsshowing a flow of the operation for recognition by the recognitiondevice.

[0063] In the recognition processing, first, a target image, e.g., animage of 27×27 pixels, is input to the image input section 1 and held inthe form of 729-dimensional vector with the luminance value of eachpixel as a component by the image holding section 2 (step 201).Subsequently, a value j of a counter for counting the number of imagemanipulation times is initialized to 0 (step 202), and the counter valuej is increased by one (step 203).

[0064] Then, the image held by the image holding section 2 is sent tothe first feature pattern generation section 11-1, where it is projectedto a predetermined, for example, 50-dimensional linear subspace andtransformed into a 50-dimensional feature pattern (step 204A). And, thisfeature pattern is input to the first subspace projection calculationsection 12-1, the projection value of the feature pattern to thesubspace and the distance between the feature pattern and the subspaceare calculated (step 205A), and the first deformed amount estimationsection 13-1 estimates a rotation angle of the image from the projectionvalue of the feature pattern to the subspace (step 206A). And, the firstestimated error evaluation section 14-1 calculates an estimated error ofthe rotation angle from the distance between the feature pattern and thesubspace (step 207A).

[0065] Meanwhile, in parallel with the processing of the steps 204A to207A, the image held by the image holding section 2 is sent to thesecond feature pattern generation section 11-2, where it is projected toa predetermined, e.g., 50-dimensional linear subspace, and transformedinto a 50-dimensional feature pattern (step 204B). And, the featurepattern is input to the second subspace projection calculation section12-2, and a projection value of the feature pattern to the subspace anda distance between the feature pattern and the subspace are calculated(step 205B). The second deformed amount estimation section 13-2estimates a degree of shift of the image from the projection value ofthe feature pattern to the subspace (step 206B). And, the secondestimated error evaluation section 14-2 calculates an estimated error ofthe degree of shift from the distance between the feature pattern andthe subspace (step 207B).

[0066] Then, the image deformation adjustment section 7 compares theestimated error of the rotation angle calculated by the first estimatederror evaluation section 14-1 and the estimated error of the degree ofshift calculated by the second estimated error evaluation section 14-2.As a result, when the estimated error of the rotation angle is smaller(NO in step 208), by the adjustment made by the image deformationadjustment section 7, the first image deformation section 6-1 rotatesthe image of the image holding section 2 in a direction to erect theimage (step 209). Meanwhile, when the estimated error of the degree ofshift is smaller (YES in step 208), by the adjustment made by the imagedeformation adjustment section 7, the second image deformation section6-2 shifts the image of the image holding section 2 in a direction toposition the image at the center (step 210). At this time, when theestimated value of the rotation angle or the degree of shift is lessthan the prescribed value (closer to the normalization image),deformation related to the flexibility is not made, and anotherdeformation is made with higher priority.

[0067] Then, the similarity calculation section 3 calculates asimilarity of the image of the image holding section 2 with thereference image (step 211). And, when the similarity exceeds aprescribed threshold (YES in step 212), it is judged that the inputimage is identical with the reference (step 213), and the recognitionprocessing is terminated.

[0068] Meanwhile, where the similarity is equal to or below thethreshold (NO in step 212), if the number of image manipulation times isa prescribed number of times or below, namely the counter value j is aprescribed number of times or below (NO in step 214), the procedureshifts to step 203, and the same processing is performed on the image(already rotated or shifted) of the image holding section 2. If thenumber of image manipulation times exceeds the prescribed number oftimes (YES in step 214), it is judged that the input image is differentfrom the reference(step 215), and the recognition processing isterminated.

[0069]FIG. 11 shows a changing state of the image held by the imageholding section 2 when the processing described with reference to FIG. 9and FIG. 10 was performed on two types of images in squares shifted by 6bits at an angle of −180 degrees and in vertical and horizontaldirections. And, FIG. 12 is a graph showing an angle of the image andthe progression of a deviation from the center when the same recognitionprocessing is performed on eight samples. It is seen from the respectivedrawings that by inputting, the image positioned upright at the centeris obtained by several operations by repeating the rotation and shiftingof the image.

[0070] There were two image deformation sections 5 in the abovedescription, but the same procedure can be used for recognitionprocessing even if there are three or more of them. For the flexibilityof image deformation, any deformation to indicate facial expressions,difference among individuals can be made other than the deformationcaused by rotation in a depth direction and a change in lighting.

[0071] And, a cumulative value of deformed amounts can be used to show adeviation from the normalized state of the target, and the state of thetarget positioned can also be recognized.

[0072] According to the embodiment described above, even when theenvironment for the target changes largely, the image recognition devicewhich can recognize the target can be realized with a realisticresource, and learning can be made with ease by using a small number oflearning samples.

[0073] [Embodiment 2]

[0074] The recognition device of Embodiment 2 has the same structure asthat in Embodiment 1 shown in FIG. 1, and only the processing by thesubspace projection calculation section 12, the deformed amountestimation section 13 and the estimated error evaluation section 14 isdifferent. Therefore, only the processing made by the subspaceprojection calculation section 12, the deformed amount estimationsection 13 and the estimated error evaluation section 14 will bedescribed, and other structures will not be described here.

[0075] The subspace projection calculation section 12 has learned thesubspace which approximates the distribution of the feature pattern ofthe reference image. Because a change in the feature vector in a realworld is very complex, a nonlinear subspace is preferable to approximatethe change. But, instead of learning the nonlinear subspace in the spaceindicating the feature pattern, Embodiment 2 maps the feature pattern ina higher dimensional space than in the feature pattern space by thepredetermined nonlinear mapping and approximately expresses thenonlinear subspace of the feature pattern space by the linear subspacein the mapped high dimensional space.

[0076] And, when the feature pattern of the subject image is input, thesubspace projection calculation section 12 maps the feature pattern tothe high dimensional space by nonlinear mapping, projects to the linearsubspace in the high dimensional space, and outputs coordinate values ofthe coordinate system of the projection on the linear subspace. Here,the projection vector is defined as a point on the linear subspace wherethe distance becomes closest to the nonlinear mapping of the featurepattern. At the same time, the distance between the nonlinear mapping ofthe feature pattern and the linear subspace is calculated and output.

[0077] Now, the above-described method to determine the distance betweenthe projection value and subspace will be described in detail. When itis assumed that the feature pattern is d-dimensional vector x and thenonlinear mapping to map x in a dΦ -dimensional high dimensional space Fis Φ, Expression (1) is established.

Φ:R ^(d)

F,x

Φ(x)=(φ₁(x), . . . , φ₁(x))^(T)  (1)

[0078] The m-dimensional linear subspace of the high dimensional spaceis determined by learning in advance, and when its base vector isassumed to be Φ1, . . . Φm, projection values α1, . . . , αm of thenonlinear mapping of the feature pattern to the linear subspace isdetermined as α1, . . . , αm which minimize the distance L between thenonlinear mapping of the feature pattern and the point on the linearsubspace as indicated by Expression (2). And, the square root of thevalue L at that time becomes a distance between the nonlinear mapping ofthe feature pattern and the linear subspace. $\begin{matrix}{L = {( {{\Phi (x)} - {\sum\limits_{i = 1}^{m}\quad {\alpha_{i}\Phi_{i}}}} ) \cdot ( {{\Phi (x)} - {\sum\limits_{i = 1}^{m}\quad {\alpha_{i}\Phi_{i}}}} )}} & (2)\end{matrix}$

[0079] However, to generally express strong nonlinearity in a featurepattern space, the number of dimensions of high dimensional spacebecomes very high, and it is substantially impossible to calculateExpression (2). Therefore, this embodiment selects special mapping asthe nonlinear mapping Φ to make it possible to use a technique which iscalled a kernel method, and makes it possible to calculate Expression(2) by a realistic computational quantity. Specifically, the nonlinearmapping Φ is selected so to relate to a predetermined kernel function.Here, the kernel function is a function which is defined by the featurepattern space indicated by Expression (3), and φ1(x), . . . , φdΦ(x) arecalled a characteristic function of the kernel function, and λ1, . . . ,λdΦ are called its characteristic value. $\begin{matrix}{{K( {x,y} )} = {\sum\limits_{i = 1}^{d_{\Phi}}\quad {\lambda_{i}{{\varphi_{i}(x)} \cdot {\varphi_{i}(y)}}}}} & (3)\end{matrix}$

[0080] As the kernel function, the Gaussian kernel shown by Expression(4) and polynominal kernel shown by Expression (5) can be used.

K(x,y)=exp(−∥x−y∥ ²/(2σ²))  (4)

K(x,y)=(1+x·y)^(d)  (5)

[0081] The selected nonlinear mapping Φ is expressed as indicated byExpression (6) by using a characteristic function and a characteristicvalue. Besides, the linear subspace is restricted so that m base vectorsof the linear subspace of the high-dimensional space become vectors ofnonlinear mapping Φ(x1), . . . , Φ(xn) of any m vectors x1, . . . , xm(hereinafter referred to as a preimage) of the d-dimensional featurepattern space.

x=(x ₁ , . . . x _(d))→Φ(x)=({square root}{fraction (λ₁)}φ₁(x), . . . ,{square root}{fraction (λ_(d) _(Φ) )}φ_(dΦ)(x))  (6)

[0082] When the relation of Expression (3) is used, Expression (2)becomes Expression (7) by using the kernel function. Expression (7) doesnot contain vector calculation of the high dimensional space explicitly,so that calculation can be made with ease. And, α1, . . . , αm whichmake the Expression (7) minimum are determined as α1, . . . , αm whichmake differential of L zero and expressed as indicated by Expression(8). Here, matrix K is a matrix which has K(xi, xj) as i row and jcolumn components. And, the minimum value L is determined bysubstituting α1, . . . , αm into Expression (7). $\begin{matrix}{L = {{K( {x,x} )} - {2{\sum\limits_{i = 1}^{m}\quad {\alpha_{i}{K( {x,x_{i}} )}}}} + {\sum\limits_{i = 1}^{m}{\sum\limits_{j = 1}^{m}\quad {\alpha_{i}\alpha_{j}{K( {x_{i},x_{j}} )}}}}}} & (7) \\{\alpha_{i} = {\sum\limits_{j = 1}^{m}{K_{ij}^{- 1}( {x,x_{j}} )}}} & (8)\end{matrix}$

[0083] Then, a learning rule of the base vector of the linear subspacewill be described. Because the base vector of the linear subspace isassumed having a preimage without fail, the learning rule becomes alearning rule of not the base vector itself but of the preimage. Thelearning rule of the preimage determines the projection values α1, . . ., αm and then moves a preimage xi in direction Δxi to decreaseExpression (7) most. The Δxi is obtained by a so-called steepest descentmethod and expressed by Expression (9) below. $\begin{matrix}{{\Delta \quad x_{i}} = {{- \eta}\quad {{\alpha_{i}( {G(x)} )}^{- 1} \cdot ( {{\frac{\partial\quad}{\partial x_{i}}{K( {x,x_{i}} )}} - {\sum\limits_{j = 1}^{m}\quad {\alpha_{j}\frac{\partial\quad}{\partial x_{i}}{K( {x_{i},x_{j}} )}}}} )}}} & (9)\end{matrix}$

[0084] Here, η is a learning coefficient, which is a positive constant.G(x) is variety of metric tensors which are embedded in the highdimensional space by nonlinear mapping, the metric tensor in Expression(9) has meaning to correct a direction of the steepest inclination inthe high dimensional space to a direction of the steepest inclination inthe feature pattern space. G(x) can also be expressed by Expression (10)by using the kernel function. By Expression (10), because it isbasically a linear optimization problem in the high dimensional space,learning can be made in a short time with good convergence differentfrom the nonlinear optimization.

g _(ab)(x)=(∂/∂x ^(α))·(∂/∂x′ ^(b))K(x,x′)|_(x=x′)  (10)

[0085] Then, the deformed amount estimation section 13 will bedescribed. The deformed amount estimation section 13 has previouslylearned the relationship between the image-deformed amount and theprojection to the subspace of the feature vector output from thesubspace projection calculation section 12, and it is used to determineand output the image-deformed amount from the projection value to thesubspace. The relationship between the projection to the subspace andthe image-deformed amount is as shown in FIG. 13 when the flexibility isrotation for example. In FIG. 13, a given projection component wasplotted with respect to the angle of the target, but each component ismost reactive to a particular angle, and the reaction becomes weak asthe angle varies. A combination of base number i and image-deformedamount θ(i) which makes the projection component to its base maximum isstored as a lookup table in the deformed amount estimation section 13.And, the image-deformed amount is determined as Expression (11) bymaking, for example, weighted means of the values θ(i) by using inputα1, . . . , αm from the subspace projection calculation section 12.$\begin{matrix}{\vartheta = \frac{\sum\limits_{i = 1}^{m}{\alpha_{i}{\theta (i)}}}{\sum\limits_{i = 1}^{m}\alpha_{i}}} & (11)\end{matrix}$

[0086] The estimated error evaluation unit 14 calculates L which is thesquare of the distance between the nonlinear mapping and the linearsubspace of the feature pattern by Expression (7) and outputs asprecision of the image-deformed amount estimated by the deformed amountestimation section 13. Because it is considered that the feature patterncan be approximated well as L is smaller, it is appropriate to use L asprecision of the image-deformed amount. It is to be understood that anymonotone increasing function related to L can be used.

[0087] Because the sample, a learning procedure of the recognitiondevice and recognition procedure required for learning in Embodiment 2are the same as in Embodiment 1, descriptions about them are omitted.

[0088] [Embodiment 3]

[0089] In Embodiment 3, a particular voice, for example, “a” isrecognized instead of an image. The recognition device of Embodiment 3has the same structure as that of Embodiment 1 shown in FIG. 1 exceptthat the “image” is replaced by “voice”.

[0090] Therefore, the structure of Embodiment 3 will not be describedhere. Because differences from Embodiment 1 are an input patternexpression method and an input pattern deformation method, they will bedescribed with reference to FIG. 14. Input voices are sampled atprescribed time intervals and held as a discrete expression with timeand two frequency expressions which are Fourier-transformed in a timeregion designated by a Fourier window. Embodiment 3 has two patterndeformation sections and their corresponding patterns deformed amountestimation sections for the held input. The first pattern deformationsection changes the frequency expression to shift the frequency.Meanwhile, the second pattern deformation section changes the timeexpression and elongates the pattern in a direction of time axis. Byhaving the above two pattern deformation sections, they can deal with achange in height of voices produced and a change in produced speed.

[0091] Because learning and recognition procedures are the same as inEmbodiment 1, their descriptions are omitted.

[0092] In the above-described embodiments, the input of images andvoices was described, but the input is not limited to them. Any type ofinformation expressible as feature patterns such as taste, sense ofsmell, and sense of touch can be applied. For an image, colors can beused for judgment, for the voice, words can be used for judgment, andsounds produced by a musical instrument can be used for judgment.

What is claimed is:
 1. A recognition device which judges whether atarget is identical with a predetermined reference, comprising: aholding means which holds the target; multiple deformation means whichdeform the target held by the holding means with at least one degree offlexibility in deformation; multiple deformed amount estimation meanswhich correspond to the deformation means in a one-to-one relationshipand estimate a deformed amount of the target from the reference withrespect to the flexibility in deformation according to the correspondingdeformation means; an estimated error evaluation means which evaluatesan estimated error of the deformed amount estimated by the deformedamount estimation means; an adjustment means which operates any of thedeformation means with precedence according to the estimated errorevaluated by the estimated error evaluation means; a similaritycalculation means which calculates a similarity between the referenceand the target which is deformed by the deformation means operated withprecedence by the adjustment means; and a judgment means which judgeswhether the target is identical with the reference according to thesimilarity calculated by the similarity calculation means.
 2. Therecognition device according to claim 1, wherein: the holding meansholds the target deformed by the deformation means; and the deformationmeans, the deformed amount estimation means, the estimated errorevaluation means and the adjustment means repeatedly deform the targetwhen it is judged by the judgment means that the target held by theholding means is not identical with the reference.
 3. The recognitiondevice according to claim 1, wherein: the deformed amount estimationmeans have a subspace learned so to approximate a distribution of apredetermined learning model, determines a point for the target on thesubspace closest to the target and indicates the deformed amount bycoordinates on the subspace indicating the determined point; and theestimated error evaluation means indicates the estimated error by adistance between the target and the subspace.
 4. The recognition deviceaccording to claim 3, wherein a value of coordinates on the subspacecorresponds to the deformed amount in a one-to-one relationship.
 5. Therecognition device according to claim 3, wherein: a particularcoordinate axis on the subspace corresponds to a particular value of thedeformed amount in a one-to-one relationship, and the deformed amountestimation means determines the deformed amount by assigning weights tothe particular value by corresponding coordinate values.
 6. Therecognition device according to claim 3, wherein the deformed amountestimation means learns the subspace by an identical mapping neuralnetwork, indicates the coordinates of the point on the subspace closestto the target by an output value of an intermediate layer of the neuralnetwork to the target and indicates a distance between the subspace andthe point indicating the target by a difference between the input andoutput of the neural network.
 7. The recognition device according toclaim 3, wherein the deformed amount estimation means maps the learningmodel in a higher dimensional space than the learning model by nonlinearcalculation and indicates the subspace as a linear subspace of the highdimensional space.
 8. The recognition device according to claim 7,wherein the nonlinear calculation is determined by a particular kernelfunction, a base vector of the subspace is restricted to be nonlinearcalculation of a particular target, and the coordinates on the subspaceand a distance between the target and the subspace are indicated by thekernel function.
 9. The recognition device according to claim 1, whereinthe deformed amount estimation means learn by using multiple learningmodels generated by deforming the reference by at least one of thedeformation means.
 10. The recognition device according to claim 1,wherein the deformed amount estimation means learn independently fromone another.
 11. The recognition device according to claim 1, whereinthe deformed amount estimation means learn in competition at the sametime.
 12. The recognition device according to claim 1, wherein thetarget and the reference are images.
 13. The recognition deviceaccording to claim 1, wherein the target and the reference are voices.14. A recognition method for determining whether a target is identicalwith a predetermined reference, comprising: estimating a deformed amountof the target from the reference according to multiple degrees offreedom for deformation; evaluating an estimated error of the estimateddeformed amount; deforming the target with the degree of freedom fordeformation having the evaluated estimated error at a minimum level;calculating a similarity between the deformed target and the reference;and judging whether the target is identical with the reference accordingto the calculated similarity.
 15. The recognition method according toclaim 14, wherein, when it is judged that the target is not identicalwith the reference, the target is deformed repeatedly.
 16. Therecognition method according to claim 14, wherein: the deformed amountis indicated by coordinates on the subspace showing a point which isdetermined for the target on the subspace closest to the target by usingthe subspace learned so to approximate a distribution of a predeterminedlearning model; and the estimated error is evaluated by the distancebetween the target and the subspace.
 17. The recognition methodaccording to claim 14, wherein, when the similarity is equal to orlarger than a predetermined value, the target and the reference aredetermined to be identical, and a state of the target is recognized fromthe cumulative value of the deformed amounts of the image.
 18. Therecognition method according to claim 14, wherein the target and thereference are images.
 19. The recognition method according to claim 14,wherein the target and the reference are voices.